Goto

Collaborating Authors

 Niagara Falls


Document Intelligence in the Era of Large Language Models: A Survey

Wang, Weishi, Hu, Hengchang, Zhang, Zhijie, Li, Zhaochen, Shao, Hongxin, Dahlmeier, Daniel

arXiv.org Artificial Intelligence

Document AI (DAI) has emerged as a vital application area, and is significantly transformed by the advent of large language models (LLMs). While earlier approaches relied on encoder-decoder architectures, decoder-only LLMs have revolutionized DAI, bringing remarkable advancements in understanding and generation. This survey provides a comprehensive overview of DAI's evolution, highlighting current research attempts and future prospects of LLMs in this field. We explore key advancements and challenges in multimodal, multilingual, and retrieval-augmented DAI, while also suggesting future research directions, including agent-based approaches and document-specific foundation models. This paper aims to provide a structured analysis of the state-of-the-art in DAI and its implications for both academic and practical applications.


Towards Interpretable Drug-Drug Interaction Prediction: A Graph-Based Approach with Molecular and Network-Level Explanations

Chen, Mengjie, Zhang, Ming, Qu, Cunquan

arXiv.org Artificial Intelligence

Drug-drug interactions (DDIs) represent a critical challenge in pharmacology, often leading to adverse drug reactions with significant implications for patient safety and healthcare outcomes. While graph-based methods have achieved strong predictive performance, most approaches treat drug pairs independently, overlooking the complex, context-dependent interactions unique to drug pairs. Additionally, these models struggle to integrate biological interaction networks and molecular-level structures to provide meaningful mechanistic insights. In this study, we propose MolecBioNet, a novel graph-based framework that integrates molecular and biomedical knowledge for robust and interpretable DDI prediction. By modeling drug pairs as unified entities, MolecBioNet captures both macro-level biological interactions and micro-level molecular influences, offering a comprehensive perspective on DDIs. The framework extracts local subgraphs from biomedical knowledge graphs and constructs hierarchical interaction graphs from molecular representations, leveraging classical graph neural network methods to learn multi-scale representations of drug pairs. To enhance accuracy and interpretability, MolecBioNet introduces two domain-specific pooling strategies: context-aware subgraph pooling (CASPool), which emphasizes biologically relevant entities, and attention-guided influence pooling (AGIPool), which prioritizes influential molecular substructures. The framework further employs mutual information minimization regularization to enhance information diversity during embedding fusion. Experimental results demonstrate that MolecBioNet outperforms state-of-the-art methods in DDI prediction, while ablation studies and embedding visualizations further validate the advantages of unified drug pair modeling and multi-scale knowledge integration.


From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering

Li, Wen-ran, Cadet, Xavier F., Medina-Ortiz, David, Davari, Mehdi D., Sowdhamini, Ramanathan, Damour, Cedric, Li, Yu, Miranville, Alain, Cadet, Frederic

arXiv.org Artificial Intelligence

Protein design with desirable properties has been a significant challenge for many decades. Generative artificial intelligence is a promising approach and has achieved great success in various protein generation tasks. Notably, diffusion models stand out for their robust mathematical foundations and impressive generative capabilities, offering unique advantages in certain applications such as protein design. In this review, we first give the definition and characteristics of diffusion models and then focus on two strategies: Denoising Diffusion Probabilistic Models and Score-based Generative Models, where DDPM is the discrete form of SGM. Furthermore, we discuss their applications in protein design, peptide generation, drug discovery, and protein-ligand interaction. Finally, we outline the future perspectives of diffusion models to advance autonomous protein design and engineering. The E(3) group consists of all rotations, reflections, and translations in three-dimensions. The equivariance on the E(3) group can keep the physical stability of the frame of each amino acid as much as possible, and we reflect on how to keep the diffusion model E(3) equivariant for protein generation.


SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision

Zheng, Kangjie, Liang, Siyue, Yang, Junwei, Feng, Bin, Liu, Zequn, Ju, Wei, Xiao, Zhiping, Zhang, Ming

arXiv.org Artificial Intelligence

SMILES, a crucial textual representation of molecular structures, has garnered significant attention as a foundation for pre-trained language models (LMs). However, most existing pre-trained SMILES LMs focus solely on the single-token level supervision during pre-training, failing to fully leverage the substructural information of molecules. This limitation makes the pre-training task overly simplistic, preventing the models from capturing richer molecular semantic information. Moreover, during pre-training, these SMILES LMs only process corrupted SMILES inputs, never encountering any valid SMILES, which leads to a train-inference mismatch. To address these challenges, we propose SMI-Editor, a novel edit-based pre-trained SMILES LM. SMI-Editor disrupts substructures within a molecule at random and feeds the resulting SMILES back into the model, which then attempts to restore the original SMILES through an editing process. This approach not only introduces fragment-level training signals, but also enables the use of valid SMILES as inputs, allowing the model to learn how to reconstruct complete molecules from these incomplete structures. As a result, the model demonstrates improved scalability and an enhanced ability to capture fragment-level molecular information. Experimental results show that SMI-Editor achieves state-of-the-art performance across multiple downstream molecular tasks, and even outperforming several 3D molecular representation models.


Self-Compositional Data Augmentation for Scientific Keyphrase Generation

Houbre, Mael, Boudin, Florian, Daille, Beatrice, Aizawa, Akiko

arXiv.org Artificial Intelligence

State-of-the-art models for keyphrase generation require large amounts of training data to achieve good performance. However, obtaining keyphrase-labeled documents can be challenging and costly. To address this issue, we present a self-compositional data augmentation method. More specifically, we measure the relatedness of training documents based on their shared keyphrases, and combine similar documents to generate synthetic samples. The advantage of our method lies in its ability to create additional training samples that keep domain coherence, without relying on external data or resources. Our results on multiple datasets spanning three different domains, demonstrate that our method consistently improves keyphrase generation. A qualitative analysis of the generated keyphrases for the Computer Science domain confirms this improvement towards their representativity property.


Artificial Intelligence of Things: A Survey

Siam, Shakhrul Iman, Ahn, Hyunho, Liu, Li, Alam, Samiul, Shen, Hui, Cao, Zhichao, Shroff, Ness, Krishnamachari, Bhaskar, Srivastava, Mani, Zhang, Mi

arXiv.org Artificial Intelligence

The proliferation of the Internet of Things (IoT) such as smartphones, wearables, drones, and smart speakers, as well as the gigantic amount of data they capture, have revolutionized the way we work, live, and interact with the world. Equipped with sensing, computing, networking, and communication capabilities, these devices are able to collect, analyze and transmit a wide range of data including images, videos, audio, texts, wireless signals, physiological signals from individuals and the physical world. In recent years, advancements in Artificial Intelligence (AI), particularly in deep learning (DL)/deep neural network (DNN), foundation models, and Generative AI, have propelled the integration of AI with IoT, making the concept of Artificial Intelligence of Things (AIoT) a reality. The synergy between IoT and modern AI enhances decision making, improves human-machine interactions, and facilitates more efficient operations, making AIoT one of the most exciting and promising areas that have the potential to fundamentally transform how people perceive and interact with the world. As illustrated in Figure 1, at its core, AIoT is grounded on three key components: sensing, computing, and networking & communication.


RGDA-DDI: Residual graph attention network and dual-attention based framework for drug-drug interaction prediction

Zhou, Changjian, Zhang, Xin, Li, Jiafeng, Song, Jia, Xiang, Wensheng

arXiv.org Artificial Intelligence

Recent studies suggest that drug-drug interaction (DDI) prediction via computational approaches has significant importance for understanding the functions and co-prescriptions of multiple drugs. However, the existing silico DDI prediction methods either ignore the potential interactions among drug-drug pairs (DDPs), or fail to explicitly model and fuse the multi-scale drug feature representations for better prediction. In this study, we propose RGDA-DDI, a residual graph attention network (residual-GAT) and dual-attention based framework for drug-drug interaction prediction. A residual-GAT module is introduced to simultaneously learn multi-scale feature representations from drugs and DDPs. In addition, a dual-attention based feature fusion block is constructed to learn local joint interaction representations. A series of evaluation metrics demonstrate that the RGDA-DDI significantly improved DDI prediction performance on two public benchmark datasets, which provides a new insight into drug development.


A Review of Large Language Models and Autonomous Agents in Chemistry

Ramos, Mayk Caldas, Collison, Christopher J., White, Andrew D.

arXiv.org Artificial Intelligence

Large language models (LLMs) are emerging as a powerful tool in chemistry across multiple domains. In chemistry, LLMs are able to accurately predict properties, design new molecules, optimize synthesis pathways, and accelerate drug and material discovery. A core emerging idea is combining LLMs with chemistry-specific tools like synthesis planners and databases, leading to so-called "agents." This review covers LLMs' recent history, current capabilities, design, challenges specific to chemistry, and future directions. Particular attention is given to agents and their emergence as a cross-chemistry paradigm. Agents have proven effective in diverse domains of chemistry, but challenges remain. It is unclear if creating domain-specific versus generalist agents and developing autonomous pipelines versus "co-pilot" systems will accelerate chemistry. An emerging direction is the development of multi-agent systems using a human-in-the-loop approach. Due to the incredibly fast development of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.


Analysis of Atom-level pretraining with Quantum Mechanics (QM) data for Graph Neural Networks Molecular property models

Arjona-Medina, Jose, Nugmanov, Ramil

arXiv.org Artificial Intelligence

Despite the rapid and significant advancements in deep learning for Quantitative Structure-Activity Relationship (QSAR) models, the challenge of learning robust molecular representations that effectively generalize in real-world scenarios to novel compounds remains an elusive and unresolved task. This study examines how atom-level pretraining with quantum mechanics (QM) data can mitigate violations of assumptions regarding the distributional similarity between training and test data and therefore improve performance and generalization in downstream tasks. In the public dataset Therapeutics Data Commons (TDC), we show how pretraining on atom-level QM improves performance overall and makes the activation of the features distributes more Gaussian-like which results in a representation that is more robust to distribution shifts. To the best of our knowledge, this is the first time that hidden state molecular representations are analyzed to compare the effects of molecule-level and atom-level pretraining on QM data.


Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies

Chu, Zhixuan, Wang, Yan, Zhu, Feng, Yu, Lu, Li, Longfei, Gu, Jinjie

arXiv.org Artificial Intelligence

The advent of large language models (LLMs) such as ChatGPT, PaLM, and GPT-4 has catalyzed remarkable advances in natural language processing, demonstrating human-like language fluency and reasoning capacities. This position paper introduces the concept of Professional Agents (PAgents), an application framework harnessing LLM capabilities to create autonomous agents with controllable, specialized, interactive, and professional-level competencies. We posit that PAgents can reshape professional services through continuously developed expertise. Our proposed PAgents framework entails a tri-layered architecture for genesis, evolution, and synergy: a base tool layer, a middle agent layer, and a top synergy layer. This paper aims to spur discourse on promising real-world applications of LLMs. We argue the increasing sophistication and integration of PAgents could lead to AI systems exhibiting professional mastery over complex domains, serving critical needs, and potentially achieving artificial general intelligence.